-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will | ||
| * DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be | ||
| * a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the | ||
| * new row that located in partition-B. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything validate this constraint?
| switch (row.getRowKind()) { | ||
| case INSERT: | ||
| case UPDATE_AFTER: | ||
| if (upsert) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should only happen for the INSERT case because UPDATE_AFTER implies that there was an UPDATE_BEFORE that will perform the delete. This would delete the same row twice in that case, causing more equality deletes to be written for the row.
|
Mostly looks good, but I don't think that upsert should be supported for |
…re to use UPSERT (apache#1996)
|
I think this is just waiting on someone to pick it up again. UPSERT should be unblocked now that row identifier fields have been added. |
Many people will export the result of flink aggregate values into apache iceberg table, for example:
This stream query will count the click number since the beginning of today (00:00:00), every emitted events will be a UPSERT events which overwrite the previous accumulated click_num.
In this cases, we will need to transform all INSERT/UPDATE_AFTER to be UPSERT, which means DELETE + INSERT the key.